An integrated system for processing information from genealogical text
نویسندگان
چکیده
First, we survey the nonstandard or exaggerated linguistic characteristics that Englishlanguage genealogical text (and indeed that of other languages) often exhibits. For example, in English genealogical prose frequent repetition of subject pronouns is avoided---they are simply dropped, though this would usually be considered ungrammatical except in diaries. Also, genealogical text frequently mentions names, dates, and places in ways that cause problems for traditional natural language processing (NLP) systems. We briefly illustrate how variation from grammatical norms is also common in other languages for genealogical text, though for this talk we focus on English. We discuss how this type of prose is typically preprocessed and tokenized, and then mention how our approach is implemented as the first stage in our integrated system. The result of our integrated approach, that of preprocessing raw genealogical text, is render it more amenable to subsequent linguistic-based treatment.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAN INTEGRATED FIS-QFD MODEL FOR EVALUATION OF INTERNET SERVICE PROVIDER
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملAN INTEGRATED FIS-QFD MODEL FOR EVALUATION OF INTERNET SERVICE PROVIDER
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...
متن کاملEXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملIdentification of a Set of Activities to Be Collectively Considered as an Integrated System for Registering Pharmaceutical Supplies in the Ministry of Health and Medical Education, Iran
Background and Aim: One of the complex processes in the Ministry of Health and Medical Education in Iran is the process of registering pharmaceutical supplies. Currently the registration process is a multi-stage process, resulting in parallel services, a waste of time and unnecessary expenses. Therefore, an integrated system will improve the relevant service delivery. The purpose of this study ...
متن کامل